LCE: a link-based cluster ensemble method for improved gene expression data analysis
نویسندگان
چکیده
MOTIVATION It is far from trivial to select the most effective clustering method and its parameterization, for a particular set of gene expression data, because there are a very large number of possibilities. Although many researchers still prefer to use hierarchical clustering in one form or another, this is often sub-optimal. Cluster ensemble research solves this problem by automatically combining multiple data partitions from different clusterings to improve both the robustness and quality of the clustering result. However, many existing ensemble techniques use an association matrix to summarize sample-cluster co-occurrence statistics, and relations within an ensemble are encapsulated only at coarse level, while those existing among clusters are completely neglected. Discovering these missing associations may greatly extend the capability of the ensemble methodology for microarray data clustering. RESULTS The link-based cluster ensemble (LCE) method, presented here, implements these ideas and demonstrates outstanding performance. Experiment results on real gene expression and synthetic datasets indicate that LCE: (i) usually outperforms the existing cluster ensemble algorithms in individual tests and, overall, is clearly class-leading; (ii) generates excellent, robust performance across different types of data, especially with the presence of noise and imbalanced data clusters; (iii) provides a high-level data matrix that is applicable to many numerical clustering techniques; and (iv) is computationally efficient for large datasets and gene clustering. AVAILABILITY Online supplementary and implementation are available at: http://users.aber.ac.uk/nii07/bioinformatics2010. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
منابع مشابه
Weighted delta factor cluster ensemble algorithm for categorical data clustering in data mining
Though many cluster ensemble approaches came forward as a potential and dominant method for enhancing the robustness, stability and the quality of individual clustering systems, it is intensely observed that this approach in most cases generate a final data partition with deficient information. The primary ensemble information matrix generated in the traditional cluster ensemble approaches resu...
متن کاملA Link-Based Cluster Ensemble Approach for Improved Gene Expression Data Analysis
It is difficult from possibilities to select a most suitable effective way of clustering algorithm and its dataset, for a defined set of gene expression data, because we have a huge number of ways and huge number of gene expressions. At present many researchers are preferring to use hierarchical clustering in different forms, this is no more totally optimal. Cluster ensemble research can solve ...
متن کاملModification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملPervasive Correlated Evolution in Gene Expression Shapes Cell and Tissue Type Transcriptomes
The evolution and diversification of cell types is a key means by which animal complexity evolves. Recently, hierarchical clustering and phylogenetic methods have been applied to RNA-seq data to infer cell type evolutionary history and homology. A major challenge for interpreting this data is that cell type transcriptomes may not evolve independently due to correlated changes in gene expression...
متن کاملClustering Gene Expression Profiles Using Mixture Model Ensemble Averaging Approach
Clustering has been an important tool for extracting underlying gene expression patterns from massive microarray data. However, most of the existing clustering methods cannot automatically separate noise genes, including scattered, singleton and mini-cluster genes, from other genes. Inclusion of noise genes into regular clustering processes can impede identification of gene expression patterns....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 26 12 شماره
صفحات -
تاریخ انتشار 2010